GCC Rust Monthly Report #17 May 2022

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Milestone Progress

Another busy month for Rust GCC; we have made a lot of progress in many different areas, some of which we will discuss more in our reports over the summer.

This report was targeted to close out our imports and visibility milestone, but given that Philbert allocated time on bugs and, in particular, the goal test case of Blake3, we have not hit this target date. However, in a broader context, one of our Google Summer of Code student projects directly helps our next milestone with both Arthur and Philbert; which means we should be able to mitigate the impact of any skew.

Tracking our progress with our milestone table does not tell the complete picture of the project’s state. So on, going reports will be monitoring the status of our goal test cases and their respective issue trackers. Blake 3 is in a really good place right now; we have two open issues, one with the parser and one to finish implementing code generation for iterators. When we complete this milestone, we should have all the core features in place (minus bugs) to start trying to compile actual rust code, so more emphasis will begin to be placed on this and testing moving forward. This is critical to ensure we can have an accurate picture of the current status of the compiler so we can plan what needs to be done to make it useable.

Also, keep an eye on this youtube channel https://www.youtube.com/c/LiveEmbeddedEvent/videos Philbert, and Arthur both gave a talk on the compiler here, and we hope to see the talk uploaded here soon.

Monthly Community Call

It’s time for our next community call, feel free to join in! 🙂

GSoC 2022

Today we welcome our new google summer of code 2022 students:

  • Faisal Abbas who will be working with philbert on porting over the constexpr support from the cpp front-end
  • Andrew A.N will be working with Arthur Cohen to develop our HIR dump which will aid in our compiler debugging experience

Thanks Google! Good luck students :).

Completed Activities

  • Fix size used in unsized adjustments PR1217
  • ast: lower: Refactor ASTLowerItem in its own source file PR1216
  • Report pub restricted violations PR1215
  • Replace SSH cloning with HTTPS cloning in README.md PR1214
  • intrinsic: add rotate_left and rotate_right intrinsic PR1213
  • intrinsic: add breakpoint intrinsic PR1212
  • Preserve inside_loop context when type checking match PR1211
  • Allow match on boolean expressions PR1209
  • Use correct format specifiers for unisnged HOST_WIDE_INT PR1206
  • Take advantage of OBJ_TYPE_REF’S in dyn calls PR1205
  • Resolve module visibility properly PR1202
  • Generic functions should not be TREE_PUBLIC PR1201
  • Remove duplicated code for expansions of types and expressions PR1200
  • Add new as_name interface for Dynamic types PR1199
  • Support recursive coercion sites PR1197
  • Resolve simple paths in use items PR1191
  • Lowe IfLet expression to HIR PR1218
  • Add optional<T> for development in C++11 PR1219
  • Apply coercion sites on unions PR1220
  • Don’t return error_mark_node on LoopExpr’s PR1221
  • Add destructure for generics on coercion sites PR1222
  • Fix bad type resolution for associated types PR1223
  • Fix macro expansion on repetitions PR1225
  • Fix tests on i386 PR1228
  • Bit shifts need to cast the types PR1240
  • Fix ICE in repition macro PR1242
  • Integers can be casted to pointers PR1243
  • Support match-expr on integers and chars PR1244
  • Add name-resolution on IfLet expression PR1241
  • Support reporting common privacy issues PR1246
  • Support Range expression in match-arms PR1248
  • Report privacy violations PR1246
  • Bug fix extern blocks defined within blocks PR1250
  • Do not rely on endianness for the testsuite tests PR1254
  • Handle more complex privacy violations PR1252
  • Inspect expressions for privacy violations PR1255
  • Report privacy violations within types PR1258
  • Support ArrayIndex expression in dead code analysis PR1284
  • Add new AST dump visior PR1287
  • Add compiler build info to the Docker image PR1288
  • Fix bad name canonicalization in covariant types PR1293
  • Add name resolution to for loops PR1292
  • Add new mappings to support complex paths PR1294
  • Fix bad impl overlap check PR1291
  • Reformat copyright header PR1290

Contributors this month

Overall Task Status

CategoryLast MonthThis MonthDelta
TODO131145+14
In Progress2527+2
Completed366389+23
GitHub Issues

Test Cases

CategoryLast MonthThis MonthDelta
Passing60386311+273
Failed
XFAIL2523-2
XPASS
make check-rust

Bugs

CategoryLast MonthThis MonthDelta
TODO4954+5
In Progress1212
Completed146164+18
GitHub Bugs

Milestone Progress

MilestoneLast MonthThis MonthDeltaStart DateCompletion DateTarget
Data Structures 1 – Core100%100%30th Nov 202027th Jan 202129th Jan 2021
Control Flow 1 – Core100%100%28th Jan 202110th Feb 202126th Feb 2021
Data Structures 2 – Generics100%100%11th Feb 202114th May 202128th May 2021
Data Structures 3 – Traits100%100%20th May 202117th Sept 202127th Aug 2021
Control Flow 2 – Pattern Matching100%%10020th Sept 20219th Dec 202129th Nov 2021
Macros and cfg expansion100100%1st Dec 202128th Mar 2022
Imports and Visibility0%48%+4829th Mar 202227th May 2022
Const Generics0%0%30th May 202225th Jul 2022
Intrinsics0%0%6th Sept 202230th Sept 2022
GitHub Milestones

Risks

RiskImpact (1-3)Likelihood (0-10)Risk (I * L)Mitigation
Rust Language Changes3721Keep up to date with the Rust language on a regular basis
Going over target dates3515Maintain status reports and issue tracking to stakeholders

Cross testing project

TestsuiteCompilerTest casesPassesFailures
rustc testsuitegccrs -fsyntax-only15481127832698
gccrs testsuiterustc stable563390173
rustc testsuitegccrs66038775726
rustc testsuite (no std)gccrs27646982066
rustc testsuite (no core)gccrs17814533
https://github.com/Rust-GCC/testing

System Integration Tests

Planned Activities

  • Finish complex path like super::super::super or crate keyword.
  • Apply this to use statements
  • Read in the export data and test linking

Detailed changelog

Match on boolean expressions

Thanks to David Faust, the compiler is now able to match on boolean expressions on top of patterns (which were already handled):

let a = false;

match a {
    true => { /* ... */ },
    false => { /* ... */ },
}

This adds reusable code for the remaining match arm possibilities to implement such as integers or strings.

pub(restricted) lints

As part of this milestone, it is important to resolve pub(restricted) items properly. pub(restricted) items refer to all items with a visibility modifier containing a path: This can be the often seen pub(crate) or more specific paths such as pub(in some::super::path).

These restrictions can only refer to valid modules that are ancestor modules:

mod sain {
    mod doux {
	  mod graal { }

	  struct A0;

	  pub(in doux) struct A1; // valid
	  pub(in sain::doux) struct A2; // valid

	  pub(in sain::doux::A0) struct A3;
	  // valid path, invalid restriction! This is a type, not a module

	  pub(in sain::doux::graal) struct A4;
	  // valid path, invalid restriction! This is a child module, not a parent

	  pub(in not::exist::at_all) struct A5; // invalid path
    }
}

Note that we do not currently handle the differences between pub(restricted) in the 2015 and 2018 editions of the language: What we currently have is closer to the 2015 edition, and will keep on being worked on.

More compiler intrinsics

Thanks to the work done by liushuyu, our backend keeps getting extended with new attributes and intrinsics. This week, the compiler gained support for breakpoint, rotate_left and rotate_right.

Match Expression

Thanks to David Faust for adding more support in our Match expression so that we can now support matching integers, chars and ranges.

fn foo_u32 (x: u32) {
    match x {
        15 => {
            let a = "fifteen!\n\0";
            let b = a as *const str;
            let c = b as *const i8;
            printf (c);
        }

        _ => {
            let a = "other!\n\0";
            let b = a as *const str;
            let c = b as *const i8;
            printf (c);
        }
    }
}

const BIG_A: char = 'A';
const BIG_Z: char = 'Z';

fn bar (x: char) {
    match x {

        'a'..='z' => {
            let a = "lowercase\n\0";
            let b = a as *const str;
            let c = b as *const i8;
            printf (c);
        }
        BIG_A..=BIG_Z => {
            let a = "uppercase\n\0";
            let b = a as *const str;
            let c = b as *const i8;
            printf (c);
        }
        _ => {
            let a = "other\n\0";
            let b = a as *const str;
            let c = b as *const i8;
            printf (c);
        }
    }
}

More work is still to be done here to handle matching Tuples and ADT’s.

Bit shift operations cast

In rust arithmetic operations usually unify the types involved to resolve whats going on here. But bit shift operations are a special case where they actually cast their types.

fn foo() -> u8 {
    1u8 << 2u32
}

Support casting integers to pointers

In embeded programming we often need to turn raw addresses into pointers. This required us to update our casting rules to support this.

const TEST: *mut u8 = 123 as *mut u8;

fn test() {
    let a = TEST;
}

Privacy violations

All of the efforts regarding the privacy pass in the recent weeks have allowed us to have a solid privacy-reporting base. This will make it easy to report private items in public contexts, as well as have a variety of hints for good user experience.

This first implementation concerns functions and function calls.

mod orange {
    mod green {
        fn sain() {}
        pub fn doux() {}
    }

    fn brown() {
        green::sain(); // error: The function definition is private in this context
        green::doux();
    }
}

We also support pub(restricted) visibilities seamlessly thanks to the work done in the past few weeks regarding path resolution

mod foo {
    mod bar {
        pub(in foo) fn baz() {}
    }

    fn baz() {
        bar::baz(); // no error, foo::bar::baz is public in foo
    }
}

Privacy violations

Last week, the work done on the privacy reporting visitor was but a stepping stone for the current privacy pass: It could only handle function calls in simple blocks, and not in let statements or loops. Similarly, the “valid ancestor check”, that we were performing to see if an item’s definition module was an ancestor of the current module where said item is referenced, would only go “one step down” in the ancestry tree. This meant that the following Rust code

fn parent() {}

mod foo {
    mod bar {
        fn mega_child() {
            crate::parent();
        }
    }
}

Would cause errors in our privacy pass, despite being perfectly valid code. This is now handled and the ancestry checks are performed recursively as they should.

On top of reporting privacy errors in more expression places (if private_fn(), let _ = private_fn()…), we have also added privacy checks to explicit types. This means reporting errors for nice, simple private structures:

mod orange {
    mod green {
        struct Foo;
        pub(in orange) struct Bar;
        pub struct Baz;
    }

    fn brown() {
        let _ = green::Foo; // privacy error
        let _ = green::Bar;
        let _ = green::Baz;

        let _: green::Foo; // privacy error

        fn any(a0: green::Foo, a1: green::Bar) {}
        //         ^ privacy error
    }
}

As well as complex nested types inside arrays, tuples or function pointers.

More work will be coming regarding trait visibility, associated types, opaque types and so on.

Slice Type layout

We got slices typechecking and code generation working a few reports ago, but there was an issue in actually running code that used them. It boils down to this function, where the range index trait function ends up creating us our new FatPtr which is the same layout of a Slice. The interesting part here is that we are creating a new FatPtr object which is inside a union, then we return the *const [T] variant to keep the typechecker happy. This code smells funny to C/C++ programmers since this object has been allocated on the stack.

struct FatPtr<T> {
    data: *const T,
    len: usize,
}

pub union Repr<T> {
    rust: *const [T],
    rust_mut: *mut [T],
    raw: FatPtr<T>,
}

const fn slice_from_raw_parts<T>(data: *const T, len: usize) -> *const [T] {
    unsafe {
        Repr {
            raw: FatPtr { data, len },
        }
        .rust
    }
}

It turns out that *const [T] or &mut [T] is not a pointer to a slice. The layout of a slice is actually a structure. You can see from the GCC code-gen gimple dump: https://godbolt.org/z/Gq5EYdYcz that the result of a the slice_from_raw_parts is _not a pointer but a struct as well.

Overall:

  • *const[T]
  • *mut [T]
  • &mut [T]
  • &[T]

All have the same layout of struct { raw_data_ptr, len } which ends up being twice the size of a normal pointer so it can be easily handled by a compiler’s code-generation. The other interesting piece we noticed during this investigation was that when you use GDB on Rust code and take the address of a normal array GDB treats this as a slice implicitly also:

fn main() {
    let a = 123;
    let b: *const i32 = &a;
    let c = core::ptr::slice_from_raw_parts(b, 1);
}
Temporary breakpoint 1, rs_slice::main () at rs-slice.rs:2
2           let a = 123;
(gdb) n
3           let b: *const i32 = &a;
(gdb) n
4           let c = core::ptr::slice_from_raw_parts(b, 1);
(gdb) p a
$1 = 123
(gdb) p b
$2 = (*mut i32) 0x7fffffffd9d4
(gdb) n
6           let d = 123;
(gdb) p c
$3 = *const [i32] {data_ptr: 0x7fffffffd9d4, length: 1}
(gdb) p *c
Attempt to take contents of a non-pointer value.

Also, notice you cannot dereference this *const [i32] since its a non-pointer value. See this compiler explorer link: https://godbolt.org/z/9xe4Wvs3e

More info:

https://github.com/Rust-GCC/gccrs/commit/cd39861da5e1113207193bb8b3e6fb3dde92895f https://doc.rust-lang.org/reference/dynamically-sized-types.html https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=672adac002939a2dab43b8d231adc1dc

Intrinsic access support:

The remaining issue we have is that Rusts libcore describes SliceIndex access like this:

unsafe impl<T> SliceIndex<[T]> for usize {
    type Output = T;

    fn get(self, slice: &[T]) -> Option<&T> {
        unsafe { Option::Some(&*self.get_unchecked(slice)) }
    }

    unsafe fn get_unchecked(self, slice: *const [T]) -> *const T {
        // SAFETY: the caller guarantees that `slice` is not dangling, so it
        // cannot be longer than `isize::MAX`. They also guarantee that
        // `self` is in bounds of `slice` so `self` cannot overflow an `isize`,
        // so the call to `add` is safe.
        unsafe { slice.as_ptr().add(self) }
    }

    fn index(self, slice: &[T]) -> &T {    
        // It works if you change this to unsafe { &*self.get_unchecked(slice) }
        // N.B., use intrinsic indexing
        &(*slice)[self]        
    }
}

This ends up looking as though slice access is recursive but obviously this is not the case. Rust actually treats this as an intrinsic operation. For now we can work around this by changing the rust code:

unsafe impl<T> SliceIndex<[T]> for usize {
    type Output = T;

    fn get(self, slice: &[T]) -> Option<&T> {
        unsafe { Option::Some(&*self.get_unchecked(slice)) }
    }

    unsafe fn get_unchecked(self, slice: *const [T]) -> *const T {
        // SAFETY: the caller guarantees that `slice` is not dangling, so it
        // cannot be longer than `isize::MAX`. They also guarantee that
        // `self` is in bounds of `slice` so `self` cannot overflow an `isize`,
        // so the call to `add` is safe.
        unsafe { slice.as_ptr().add(self) }
    }

    fn index(self, slice: &[T]) -> &T {
        unsafe { &*self.get_unchecked(slice) }
    }
}

More info:

https://users.rust-lang.org/t/why-this-does-not-lead-to-recursion/50306/3 Rust-GCC/gccrs#1269

Str type layout

Str represents the raw string type in Rust which has specific type checking rules as it is another DST which happens to be the same layout of a Slice. Below is an example which shows Borrowing has no effect on type. The rules here are likely to affect all DST’s in regards to borrows and dereferences.

let a:&str = "TEST 1";
let b:&str = &"TEST 2";

When we have the same layout of a Slice we can actually get the length of the string by transmuting to a slice which is what libcore does:

mod mem {
    extern "rust-intrinsic" {
        fn transmute<T, U>(_: T) -> U;
    }
}

extern "C" {
    fn printf(s: *const i8, ...);
}

struct FatPtr<T> {
    data: *const T,
    len: usize,
}

pub union Repr<T> {
    rust: *const [T],
    rust_mut: *mut [T],
    raw: FatPtr<T>,
}

impl<T> [T] {
    pub const fn len(&self) -> usize {
        unsafe { Repr { rust: self }.raw.len }
    }
}

impl str {
    pub const fn len(&self) -> usize {
        self.as_bytes().len()
    }

    pub const fn as_bytes(&self) -> &[u8] {
        unsafe { mem::transmute(self) }
    }
}

fn main() -> i32 {
    let t1: &str = "TEST1";
    let t2: &str = &"TEST_12345";

    let t1sz = t1.len();
    let t2sz = t2.len();

    unsafe {
        let a = "t1sz=%i t2sz=%i\n";
        let b = a as *const str;
        let c = b as *const i8;

        printf(c, t1sz as i32, t2sz as i32);
    }

    0
}

Which in turn generates the following GIMPLE:

__attribute__((cdecl))
struct &[u8] str::as_bytes (const struct  self)
{
  struct &[u8] D.253;

  {
    RUSTTMP.2 = transmute<&str, &[u8]> (self);
  }
  D.253 = RUSTTMP.2;
  return D.253;
}


struct &[u8] transmute<&str, &[u8]> (const struct  _)
{
  struct &[u8] D.255;

  D.255 = VIEW_CONVERT_EXPR<struct &[u8]>(_);
  return D.255;
}


__attribute__((cdecl))
usize T::len<u8> (const struct &[u8] self)
{
  union 
{
  struct *const [u8] rust;
  struct *mut [u8] rust_mut;
  struct test::FatPtr<u8> raw;
} D.257;
  usize D.258;

  {
    D.257.rust = self;
    RUSTTMP.4 = D.257.raw.len;
  }
  D.258 = RUSTTMP.4;
  return D.258;
}


__attribute__((cdecl))
usize str::len (const struct  self)
{
  usize D.260;
  struct 
{
  u8 * data;
  usize len;
} D.261;

  D.261 = str::as_bytes (self);
  D.260 = T::len<u8> (D.261);
  return D.260;
}


__attribute__((cdecl))
i32 test::main ()
{
  i32 D.263;
  const struct  t1;
  const struct  t2;
  const usize t1sz;
  const usize t2sz;

  try
    {
      t1.data = "TEST1";
      t1.len = 5;
      t2.data = "TEST_12345";
      t2.len = 10;
      t1sz = str::len (t1);
      t2sz = str::len (t2);
      {
        const struct  a;
        const struct  b;
        const i8 * const c;

        try
          {
            a.data = "t1sz=%i t2sz=%i\n";
            a.len = 16;
            b = a;
            c = b.data;
            _1 = (i32) t2sz;
            _2 = (i32) t1sz;
            printf (c, _2, _1);
          }
        finally
          {
            a = {CLOBBER};
            b = {CLOBBER};
          }
      }
      D.263 = 0;
      return D.263;
    }
  finally
    {
      t1 = {CLOBBER};
      t2 = {CLOBBER};
    }
}

https://godbolt.org/z/31PPz5b1x

Leave a Reply

Your email address will not be published.